Skip to content

Optimization: Check for implicit anchor#165

Closed
Andersama wants to merge 1 commit intohanickadot:mainfrom
Andersama:implicit_anchor
Closed

Optimization: Check for implicit anchor#165
Andersama wants to merge 1 commit intohanickadot:mainfrom
Andersama:implicit_anchor

Conversation

@Andersama
Copy link
Copy Markdown
Contributor

Slight improvement on #95

From O'Reilly's book:

An engine with this optimization realizes that if a regex begins with .* or .+ and has no global alternation an implicit ^ can be prepended to the regex.

Intuition is that .* and .+ collide / consume all the initial characters that don't really match the pattern ergo .* and .+ will consume the entire input string in one go...bumping the initial start position is much like remaining in the .* or .+

@Andersama
Copy link
Copy Markdown
Contributor Author

Andersama commented Jan 6, 2021

Example: https://gcc.godbolt.org/z/3WvT98bfE

@Andersama
Copy link
Copy Markdown
Contributor Author

In testing without the implicit anchor searching for .*pattern I could not get the benchmark to complete (there's excessive amount of backtracking here). With this check however it seems to run roughly half as well as just searching for pattern directly. It's probably worth extracting the .* from the pattern and running search directly.

Improves code generation for cases where .* can be treated as an anchor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant